Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

نویسندگان

  • Zhiqiang Shen
  • Honghui Shi
  • Rogério Schmidt Feris
  • Liangliang Cao
  • Shuicheng Yan
  • Ding Liu
  • Xinchao Wang
  • Xiangyang Xue
  • Thomas S. Huang
چکیده

In this paper, we propose gated recurrent feature pyramid for the problem of learning object detection from scratch. Our approach is motivated by the recent work of deeply supervised object detector (DSOD) [24], but explores new network architecture that dynamically adjusts the supervision intensities of intermediate layers for various scales in object detection. The benefits of the proposed method are two-fold: First, we propose a recurrent featurepyramid structure to squeeze rich spatial and semantic features into a single prediction layer that further reduces the number of parameters to learn (DSOD need learn 1/2, but our method need only 1/3). Thus our new model is more fit for learning from scratch, and can converge faster than DSOD (using only 50% of iterations). Second, we introduce a novel gate-controlled prediction strategy to adaptively enhance or attenuate supervision at different scales based on the input object size. As a result, our model is more suitable for detecting small objects. To the best of our knowledge, our study is the best performed model of learning object detection from scratch. Our method in the PASCAL VOC 2012 comp3 leaderboard (which compares object detectors that are trained only with PASCAL VOC data) demonstrates a significant performance jump, from previous 64% to our 77% (VOC 07++12) and 72.5% (VOC 12). We also evaluate the performance of our method on PASCAL VOC 2007, 2012 and MS COCO datasets, and find that the accuracy of our learning from scratch method can even beat a lot of the state-of-the-art detection methods which use pre-trained models from ImageNet. Code is available at: https://github.com/szq0214/GRP-DSOD.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DenseNet: Implementing Efficient ConvNet Descriptor Pyramids

Convolutional Neural Networks (CNNs) can provide accurate object classification. They can be extended to perform object detection by iterating over dense or selected proposed object regions. However, the runtime of such detectors scales as the total number and/or area of regions to examine per image, and training such detectors may be prohibitively slow. However, for some CNN classifier topolog...

متن کامل

Contours Extraction Using Line Detection and Zernike Moment

Most of the contour detection methods suffers from some drawbacks such as noise, occlusion of objects, shifting, scaling and rotation of objects in image which they suppress the recognition accuracy. To solve the problem, this paper utilizes Zernike Moment (ZM) and Pseudo Zernike Moment (PZM) to extract object contour features in all situations such as rotation, scaling and shifting of object i...

متن کامل

Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines

Unsupervised feature learning has emerged as a promising tool in learning representations from unlabeled data. However, it is still challenging to learn useful high-level features when the data contains a significant amount of irrelevant patterns. Although feature selection can be used for such complex data, it may fail when we have to build a learning system from scratch (i.e., starting from t...

متن کامل

Cortex-inspired Recurrent Networks for Developmental Visual Attention and Recognition

Cortex-inspired Recurrent Networks for Developmental Visual Attention and Recognition By Matthew Luciw It is unknown how the brain self-organizes its internal wiring without a holisticallyaware central controller. How does the brain develop internal object representations for a massive number of objects? How do such representations enable tightly intertwined attention and recognition in the pre...

متن کامل

Biologically inspired Bayesian approach for learning object categories from few training examples

In this work we present a biologically inspired algorithm for learning object categories that uses Bayesian inference to integrate information within and across fixations. In our model, an object is represented as a collection of features of specific classes arranged at specific locations with respect to the location of the fixation point. Even though the number of feature detectors that we use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1712.00886  شماره 

صفحات  -

تاریخ انتشار 2017